{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# COMPSCI 389: Introduction to Machine Learning\n", "# Classification Example\n", "\n", "In this notebook we bring together the tools we've learned to train a classifier for a challenging computer vision task. We use the [CIFAR-100](https://www.cs.toronto.edu/~kriz/cifar.html) data set. This data set contains 32x32 pixel color images of objects from 100 classes, with 600 images per class. You can find a list of the possible classes by following the link to CIFAR-100 above (scroll down to CIFAR-100, as the top of the page describes CIFAR-10).\n", "\n", "First, we will use these imports:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import torch\n", "import torchvision\n", "import torchvision.transforms as transforms\n", "import torch.nn as nn\n", "import torch.optim as optim\n", "import torch.nn.functional as F" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Load and preprocess CIFAR-100 data set. First, set up some transformations.\n", "\n", "The first,\n", "> transforms.ToTensor()\n", "converts the images, represented as NumPy arrays, into PyTorch tensors. It scales pixel intensities from $[0,255]$ to $[0.0, 1.0]$, and changes the order of dimensions from (heigth, width, channels) to (channels, height, width).\n", "\n", "The second,\n", "> transforms.Normalize\n", "adjusts the color channels of the images (now tensors), performing a form of normalization. It re-scales the red, green, and blue channels from $[0,1]$ to $[-1,1]$. The first argument, (0.5, 0.5, 0.5), indicates that $0.5$ will be subtracted from each channel (red, green, and blue). The second argument, also $(0.5, 0.5, 0.5)$ indicates that each channel should be divided by $0.5$ (i.e., mulitplied by two)." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "transform = transforms.Compose(\n", " [transforms.ToTensor(),\n", " transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we download the data sets, placing them in a \"data\" directory (where the GPA.csv file is, if you use a local copy). We also set up DataLoaders for the training and testing sets. We do this with the following lines:\n", "```\n", "trainset = torchvision.datasets.CIFAR100(root='./data', train=True, download=True, transform=transform)\n", "trainloader = torch.utils.data.DataLoader(trainset, batch_size=4, shuffle=True, num_workers=2)\n", "\n", "testset = torchvision.datasets.CIFAR100(root='./data', train=False, download=True, transform=transform)\n", "testloader = torch.utils.data.DataLoader(testset, batch_size=4, shuffle=False, num_workers=2)\n", "```\n", "\n", "First, notice the trainset and testset lines that download the data set.\n", "1. The `train` argument specifies whether to get the training or testing data.\n", "2. The `download=True` argument specifies that the data should be downloaded from the internet if it's not already present.\n", "3. The `transform=transform` argument applies the transformations that we defined in the previous code block.\n", "\n", "Next, notice the trainloader and testloader lines, which prepare the `DataLoader` objects.\n", "1. We use mini-batches for both the training and testing sets. For testing, we aggregate the loss across all of the mini-batches. Still, mini-batching helps because there may not be enough GPU memory to store the entire testing set.\n", "2. We shuffle the training data when forming batches during each epoch, but not the testing. There is no need to shuffle the testing set, since we will be aggregating the loss over the entire testing set anyway. Shuffling the training data is necessary so that the gradient updates are computed from different random sets of batches each epoch.\n", "3. The `num_workers=2` line sets the number of subprocesses (threads) to use for data loading. This is particularly useful for larger data sets.\n", "\n", "Lastly, we get a list of the possible classes with\n", "> classes = trainset.classes" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Downloading https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz to ./data\\cifar-100-python.tar.gz\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "100.0%\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Extracting ./data\\cifar-100-python.tar.gz to ./data\n", "Files already downloaded and verified\n" ] } ], "source": [ "trainset = torchvision.datasets.CIFAR100(root='./data', train=True,\n", " download=True, transform=transform)\n", "trainloader = torch.utils.data.DataLoader(trainset, batch_size=16,\n", " shuffle=True, num_workers=2)\n", "\n", "testset = torchvision.datasets.CIFAR100(root='./data', train=False,\n", " download=True, transform=transform)\n", "testloader = torch.utils.data.DataLoader(testset, batch_size=16,\n", " shuffle=False, num_workers=2)\n", "\n", "classes = trainset.classes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Define a convolutional neural network (CNN) model. The linear layers we have discussed in our previous notebook and in class (the first argument is the number of inputs and the second is the number of outputs).\n", "\n", "`Conv2d` represents a convolutional layer for a 2-dimensional image. The first argument is the number of channels (3 for red, green, and blue), the second is the number of filters (output channels), and the third is the patch size (kernel size).\n", "\n", "`MaxPool2d` represents a pooling layer that performs max-pooling with a 2x2 window size and a stride of 2. The first argument is the window size, and the second is the stride.\n", "\n", "`view(-1, 16 * 5 * 5)` represents a flattening layer.\n", "\n", "Notice that the second convolutional layer takes 6 channels as input (the 6 channels/filters from the first convolutional layer).\n", "\n", "Notice that the first linear layer has $16\\times 5 \\times 5$ inputs. This corresponds to the 16 filters from the previous convolutional layer, each of size $5 times 5$. The size can be computed as follows. We began with a 32x32 image. The first convolution had a size of 5, reducing the size to 28x28 (since the first and last patches touch the edges of the image). The pooling layer reduces this to 14x14 (since it uses a 2x2 patch with a stride of 2). The second convolution reduces this from 14x14 to 10x10 (again, a convolution with a 5x5 patch). The last pooling layer reduces this again to 5x5. Note that this computation can be difficult for larger architectures, and so online tools exist to perform these calculations for you (search for \"convolutional network calculator\").\n", "\n", "Notice that we use F.relu, from our earlier import, `import torch.nn.functional as F`. This is similar to using `nn.ReLU()` like in our previous example. Either approach (using F.relu or nn.ReLU is reasonable)." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# Step 2: Define a CNN\n", "class Net(nn.Module):\n", " def __init__(self):\n", " super(Net, self).__init__()\n", " self.conv1 = nn.Conv2d(3, 6, 5)\n", " self.pool = nn.MaxPool2d(2, 2)\n", " self.conv2 = nn.Conv2d(6, 16, 5)\n", " self.fc1 = nn.Linear(16 * 5 * 5, 120)\n", " self.fc2 = nn.Linear(120, 84)\n", " self.fc3 = nn.Linear(84, 100)\n", "\n", " def forward(self, x):\n", " x = self.pool(F.relu(self.conv1(x)))\n", " x = self.pool(F.relu(self.conv2(x)))\n", " x = x.view(-1, 16 * 5 * 5)\n", " x = F.relu(self.fc1(x))\n", " x = F.relu(self.fc2(x))\n", " x = self.fc3(x)\n", " return x" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create the network." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "net = Net()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Select the loss function and optimizer." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "criterion = nn.CrossEntropyLoss()\n", "optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Set up for GPU training:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "device(type='cuda', index=0)" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "Net(\n", " (conv1): Conv2d(3, 6, kernel_size=(5, 5), stride=(1, 1))\n", " (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n", " (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))\n", " (fc1): Linear(in_features=400, out_features=120, bias=True)\n", " (fc2): Linear(in_features=120, out_features=84, bias=True)\n", " (fc3): Linear(in_features=84, out_features=100, bias=True)\n", ")" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "device = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\")\n", "display(device)\n", "net.to(device)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Train the network:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1, 2000] loss: 4.602\n", "[2, 2000] loss: 4.276\n", "[3, 2000] loss: 3.820\n", "[4, 2000] loss: 3.569\n", "[5, 2000] loss: 3.380\n", "[6, 2000] loss: 3.226\n", "[7, 2000] loss: 3.116\n", "[8, 2000] loss: 3.020\n", "[9, 2000] loss: 2.929\n", "[10, 2000] loss: 2.862\n", "[11, 2000] loss: 2.805\n", "[12, 2000] loss: 2.727\n", "[13, 2000] loss: 2.677\n", "[14, 2000] loss: 2.638\n", "[15, 2000] loss: 2.589\n", "[16, 2000] loss: 2.550\n", "[17, 2000] loss: 2.505\n", "[18, 2000] loss: 2.456\n", "[19, 2000] loss: 2.430\n", "[20, 2000] loss: 2.389\n", "[21, 2000] loss: 2.368\n", "[22, 2000] loss: 2.325\n", "[23, 2000] loss: 2.304\n", "[24, 2000] loss: 2.281\n", "[25, 2000] loss: 2.257\n", "[26, 2000] loss: 2.235\n", "[27, 2000] loss: 2.204\n", "[28, 2000] loss: 2.178\n", "[29, 2000] loss: 2.163\n", "[30, 2000] loss: 2.154\n", "[31, 2000] loss: 2.132\n", "[32, 2000] loss: 2.100\n", "[33, 2000] loss: 2.089\n", "[34, 2000] loss: 2.063\n", "[35, 2000] loss: 2.052\n", "[36, 2000] loss: 2.044\n", "[37, 2000] loss: 2.020\n", "[38, 2000] loss: 2.003\n", "[39, 2000] loss: 1.994\n", "[40, 2000] loss: 1.976\n", "[41, 2000] loss: 1.971\n", "[42, 2000] loss: 1.965\n", "[43, 2000] loss: 1.940\n", "[44, 2000] loss: 1.941\n", "[45, 2000] loss: 1.909\n", "[46, 2000] loss: 1.914\n", "[47, 2000] loss: 1.890\n", "[48, 2000] loss: 1.896\n", "[49, 2000] loss: 1.879\n", "[50, 2000] loss: 1.882\n", "[51, 2000] loss: 1.848\n", "[52, 2000] loss: 1.838\n", "[53, 2000] loss: 1.833\n", "[54, 2000] loss: 1.823\n", "[55, 2000] loss: 1.827\n", "[56, 2000] loss: 1.807\n", "[57, 2000] loss: 1.803\n", "[58, 2000] loss: 1.805\n", "[59, 2000] loss: 1.775\n", "[60, 2000] loss: 1.782\n", "[61, 2000] loss: 1.762\n", "[62, 2000] loss: 1.763\n", "[63, 2000] loss: 1.755\n", "[64, 2000] loss: 1.750\n", "[65, 2000] loss: 1.748\n", "[66, 2000] loss: 1.741\n", "[67, 2000] loss: 1.725\n", "[68, 2000] loss: 1.720\n", "[69, 2000] loss: 1.719\n", "[70, 2000] loss: 1.724\n", "[71, 2000] loss: 1.718\n", "[72, 2000] loss: 1.701\n", "[73, 2000] loss: 1.688\n", "[74, 2000] loss: 1.687\n", "[75, 2000] loss: 1.685\n", "[76, 2000] loss: 1.678\n", "[77, 2000] loss: 1.672\n", "[78, 2000] loss: 1.677\n", "[79, 2000] loss: 1.661\n", "[80, 2000] loss: 1.659\n", "[81, 2000] loss: 1.665\n", "[82, 2000] loss: 1.656\n", "[83, 2000] loss: 1.646\n", "[84, 2000] loss: 1.632\n", "[85, 2000] loss: 1.632\n", "[86, 2000] loss: 1.634\n", "[87, 2000] loss: 1.652\n", "[88, 2000] loss: 1.628\n", "[89, 2000] loss: 1.620\n", "[90, 2000] loss: 1.620\n", "[91, 2000] loss: 1.613\n", "[92, 2000] loss: 1.603\n", "[93, 2000] loss: 1.610\n", "[94, 2000] loss: 1.611\n", "[95, 2000] loss: 1.602\n", "[96, 2000] loss: 1.601\n", "[97, 2000] loss: 1.608\n", "[98, 2000] loss: 1.590\n", "[99, 2000] loss: 1.593\n", "[100, 2000] loss: 1.594\n" ] } ], "source": [ "for epoch in range(100): # loop over the dataset multiple times\n", " running_loss = 0.0\n", " for i, data in enumerate(trainloader, 0):\n", " # get the inputs; data is a list of [inputs, labels]\n", " inputs, labels = data[0].to(device), data[1].to(device)\n", "\n", " # zero the parameter gradients\n", " optimizer.zero_grad()\n", "\n", " # forward + backward + optimize\n", " outputs = net(inputs)\n", " loss = criterion(outputs, labels)\n", " loss.backward()\n", " optimizer.step()\n", "\n", " # print statistics\n", " running_loss += loss.item()\n", " if i % 2000 == 1999: # print every 2000 mini-batches\n", " print(f'[{epoch + 1}, {i + 1:5d}] loss: {running_loss / 2000:.3f}')\n", " running_loss = 0.0" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note: The above code too 23 minutes on my RTX 2070. We can now evaluate the network:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Accuracy of the network on the 10000 test images: 25 %\n" ] } ], "source": [ "correct = 0\n", "total = 0\n", "with torch.no_grad():\n", " for data in testloader:\n", " inputs, labels = data[0].to(device), data[1].to(device)\n", " outputs = net(inputs)\n", " _, predicted = torch.max(outputs.data, 1)\n", " total += labels.size(0)\n", " correct += (predicted == labels).sum().item()\n", "\n", "print(f'Accuracy of the network on the 10000 test images: {100 * correct // total} %')\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "An accuracy of 25% isn't particularly good, but it's far better than random. With 100 classes, the random classifier has an accouracy of approximately 1%. This data set has been so heavily studied and optimized that methods can achieve up to 96.08% accuracy [[link](https://paperswithcode.com/sota/image-classification-on-cifar-100)]!" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.7" } }, "nbformat": 4, "nbformat_minor": 2 }